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Abstract 

Mathematical modeling of epidemic propagation on networks is extended to hy¬ 
pergraphs in order to account for both the community structure and the nonlinear 
dependence of the infection pressure on the number of infected neighbours. The exact 
master equations of the propagation process are derived for an arbitrary hypergraph 
given by its incidence matrix. Based on these, moment closure approximation and 
mean-field models are introduced and compared to individual-based stochastic simula¬ 
tions. The simulation algorithm, developed for networks, is extended to hypergraphs. 
The effects of hypergraph structure and the model parameters are investigated via 
individual-based simulation results. 
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1 Introduction 


Spreading processes on networks has a well-developed theory, based on the simple idea that 
the higher connectivity of the network enhance the spreading process. In the language 
of epidemic propagation this means that infection pressure on a susceptible individual 
increases with the number of infectious neighbours. Typically it is assumed that the 
probability of infection is proportional to the number of infectious neighbours. This idea 
leaded to different propagation models from exact master equations [Ml IT] to different 
types of mean-field models, such as homogeneous and heterogeneous pairwise models |8l 
EKm], effective degree models m, edge based compartmental models [TTj and individual 
based models naiH], mentioning only the most widely used ones. It is also well-known 
that the community structure has strong impact on the spread of the epidemic. There are 
many publications studying household structure, workplaces, schools. 

A real modeling framework of epidemic propagation has to take into account that (a) 
the community is built up from small units, such as households and workplaces and (b) the 
infection pressure on a susceptible individual in a unit is not proportional to the number of 
infected individuals. For example, the increase of the number of infected individuals from 5 
to 10 in a workplace with 20 individuals does not necessarily mean that the infection prob¬ 
ability is doubled in that place. The aim of this paper is to develop the theory of epidemic 
propagation on hypergraphs that enables us to model both the nonlinear dependence of 
the infection pressure and the community structure. The modeling paradigm is that the 
population consists of nodes of a hypergraph that is given by the hyperedges, which are 
simply subsets of the vertex set. For example, each household and each workplace can 
be considered as a hyperedge. The usual graphs can be considered as the special case of 
hypergraphs, when each hyperedge consists of two vertices that form an edge of the graph. 
In the widely-used propagation models the rate of infection for a susceptible node in a 
household is tit, where n is the number of its infectious neighbours in the household and 
r is the per contact infection rate. In our new model, to be developed in this paper, it is 
/(n)r with some possibly non-linear function / dehned later. This function describes that 
the infection pressure is discounted as the number of infectious individuals is increased. If 
an individual belongs to two or more hyperedges, e.g. to a household and to a workplace, 
then the rate of infection is the sum of the rates in each hyperedge, e.g. (/(ui) -|- /(n 2 ))r, 
if the individual has ni infected neighbours at home and n 2 at the workplace. It is impor¬ 
tant to note that each hyperedge can be replaced by a clique (fully connected complete 
graph) and then the propagation can be considered on a graph in the conventional way 
by discounting the infection pressure for large number of infectious neighbours. In that 
case the infection rate will be f{ni + n 2 )r instead of (/(ni) -|- /(n 2 ))r, hence it cannot be 
distinguished that somebody has many infectious neighbours at home and only a few in 
the workplace, or a moderate number at both places. 

Besides developing the theory of epidemic propagation on hypergraphs we will run 
simulations on hypergraphs, derive the exact master equations and present a mean-field 
ODE approximation. For the purpose of simulations we created hypergraphs in several 
different ways. On one hand, the set of vertices was once partitioned into households and 
once into workplaces randomly, in this case each node belongs to exactly two hyperedges. 
As a second alternative, a Barabsi-Albert random graph was generated and the cliques 
were determined by a suitable algorithm. These cliques were considered as the hyperedges 
of the newly created hypergraph. This is motivated by the fact that in data mining it is 
common practice to use different algorithms for identifying cliques as communities from 
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given data of binary relations (i.e. a graph). Finally, we extended the configuration model 
to generate random hypergraphs with prescribed number and size of hyperedges. 

The effect of community structure on the propagation process has been studied in the 
literature recently, since non-trivial community structure occurs not only in epidemiology 
but also in other systems in biology, computer science and engineering. Epidemic spreading 
on networks with overlapping community structure is considered in [7], allowing that the 
rate of infection is different in the communities. Clique networks describe the phenomenon 
of individuals attending different groups in the form of multiple clique types in |19] , where 
theoretical analysis with a predefined clique degree distribution is performed. The authors 
of m conclude that modeling with hypergraphs may result in a more precise description 
of biological processes, moreover, they foresee that ’’applications of hypergraph theory in 
computational biology will increase in the near future”. A voter-like model is investigated 
for interacting particle systems on hypergraphs in [12]. In their model large blocks of 
vertices may flip simultaneously, since vertices in a hyperedge change simultaneously their 
opinion to the majority opinion of the hyperedge. 

The paper is structured as follows. In Section 2 our model for SIS epidemic prop¬ 
agation on hypergraphs is introduced. Different methods for creating hypergraphs are 
presented in Section 3. The simulation results for epidemic propagation on different hy¬ 
pergraphs are shown in Section 4. In Section 5 we extend our theory |16] for deriving exact 
master equations to the case of propagation on hypergraphs. Exact and approximating 
mean-field equations for the expected value of the number of infected nodes are derived 
and compared to simulations in Section 6 . 

2 Model formulation 

Our mathematical model starts from a hypergraph that consists of a set of nodes V = 
{ui, U 2 ,..., vn} and a set of hyperedges £ = {ei, 62 ,..., cm} where each hyperedge is a 
subset of V, that is C E for alH = 1, 2,..., M. The pair (E, £) is called a hypergraph. 
The nodes in our model represent the individuals and the hyperedges are the units of the 
community structure such as households or workplaces. The methods for creating hyper¬ 
graphs will be dealt with in the next section, here (V,£) denotes a hypergraph in general. 
The process considered in this paper is SIS (susceptible-infected-susceptible) epidemic 
propagation. This means that each node may be in one of the two states susceptible or 
infected/infectious. These states will be denoted by S and I, respectively. A susceptible 
individual can become infectious after contacting infectious ones and an infectious one 
can recover and become susceptible again after some time (not depending on the states 
of its neighbours). Both infection and recovery are governed by a Poisson processes. This 
means that an infected individual recovers with probability 1 — exp(—yAt) in a small time 
interval At, where 7 is called the recovery rate. A susceptible individual becomes infected 
with probability 1 — exp(—rAt) in a small time interval At. The rate r is given as 

r = T'^f{kh), 

h 

where the summation is for those hyperedges h & £ that contain the susceptible node, 
kh denotes the number of infected nodes in the hyperedge h and / is a given function. 
We note that in the well-known case of propagation on graphs / is the identity function, 
i.e. f{kh) = kh, yielding the infection rate in the form rk, where k is the total number 
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of infected neighbours. The choice of function / is artificial in this paper. The typical 
functional form is an inverse tangent like function that is close to the identity around zero 
and becomes constant for large values of its argument. The behaviour of the propagation 
process can be tested numerically for a large class of / functions. A careful analysis shows 
that the qualitative behaviour of the process can be understood by using the simplest 
piece-wise linear function 


fix) 


X, if 0 < X < c 

c, if X > c. 


( 1 ) 


This function is parametrized by a single parameter c, which can be interpreted as a 
threshold value. If the number of infected nodes in a hyperedge is smaller than c, then 
the infection is proportional to the number of infected neighbours as in the conventional 
network case, while above this threshold value the number of infected nodes in a hyper¬ 
edge does not increase the infection pressure. We note that the desirable approach for 
quantitative analysis would be to determine the functional form and parameters of / by 
fitting it to real epidemic propagation data. This approach is beyond the scope of this 
paper and may be the subject of future work. 

Thus our model is specified by choosing a hypergraph and the function /. Then the 
full mathematical model is a continuous time Markov chain with a state space of size 
2^, since to determine the state of the system it has to be given for each node if it is 
susceptible or infected. The hyperedges of the hypergraph and the function / determine 
the transition probabilities of the Markov chain as it will be presented in Section [5j The 
full set of master equations of this Markov chain are practically impossible to solve because 
of the extremely large size of the system, hence individual-based simulations are carried 
out and the dependence of the system behaviour on the hypergraph structure and on the 
function / is investigated via simulations, the results of which are shown in Section [H 
Before turning to simulation results let us review the methods that were used to create 
our hypergraphs. 


3 Hypergraph types used in the simulations 

As we have mentioned in the introduction simulations were preformed on hypergraphs 
generated in three different ways. In this section we introduce these models. 

3.1 Bi-uniform hypergraph model 

In the first model, each individual (a vertex of V) belongs to precisely two edges of the 
hypergraph, one will correspond to the household of the individual and the other to the 
workplace of the person. It is assumed that the households are disjoint, and each person 
works in one workplace. Thus the set of edges corresponding to the households, H, form 
a partition of V. For sake of simplicity we also assume that each household has precisely 
H members, thus is a H-uniform hypergraph consisting of disjoint edges of size 

H. With similar assumptions, the edge set of the workplaces, W, is a partition of V 
into W-element sets. We generate the partitions Ji and W randomly: always pick the 
next element to the next edge with uniform probability from the remaining unpartitioned 
elements. Finally we take the union of the two uniform hypergraphs, and obtain a special 
bi-uniform hypergraph iV^'H U W). 
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3.2 BA-cliques model 

The second model is based on the preferential attachment model invented by Barabsi and 
Albert [ 1 ]. They described a randomized algorithm that constructs graphs that are nowa¬ 
days considered to be one of the best models for numerous natural and human-made struc¬ 
tures, including social networks. There are several slightly different ways of implementing 
this algorithm, however in each case one obtains a graph with similar characteristics. 

In m it was shown that the number of vertices with degree at least k is and in [5] 
it was proved that the diameter of such a graph is asymptotically log re/log log n. These 
results match the measures of many natural networks. 

In our case, first we generated a random graph using the above method with our own 
implementation, then it was converted to a hypergraph on the same vertex set. The graph 
itself represents only the structure of binary relations, however, we need a representation 
of the structure of some communities, consisting of more members. We assume that each 
member of a community is in relation with each other member of the community, so 
members of a community forms a complete subgraph (i.e. there is en edge between any 
pair of members). Therefore to find these communities we listed all complete subgraphs 
of our graph using the software called CFinder [ 6 ], these are the edges of our hypergraph. 
In this way we obtain many small size hyperedges and few large ones [2]. 

It is worth mentioning that this algorithm has an exponential worst case running time 
for general graphs as there may be exponentially many cliques. Fortunately one can show 
that for graphs obtained by the preferential attachment model this is very unlikely, and 
the algorithm is most likely efficient in this case. 

3.3 Configuration model 

In this model our aim is to generate a random, uniform, regular hypergraph on given 
number of vertices. Therefore, we fix the number of vertices {N), the size of each hyperedge 
(e) and the number of edges each vertex is contained in (i.e. the degree of each vertex, d). 
Simple double counting shows that in this way the number of hyperedges is M = Nd/e. In 
a more general setup, one can start with a prescribed size for each hyperedge individually, 
and a prescribed degree for each vertex. For example, we can set half of the edges to have 
size Cl and the other half size 62 , while half of the vertices have degree di and the other 
half d 2 (see Fig. [ 6 l) 

All hypergraphs can be associated with a bipartite graph (i.e. a 2-colorable graph) 
in the following way. Let V be the set of vertices of the hypergraph, this will be one 
of the color classes in the bipartite graph. The other color class will consist of vertices 
corresponding to each edge of the hypergraph ([/). A vertex in V is adjacent to vertex in 
U if the vertex in contained in edge corresponding to the vertex in U. Since in our first 
case every vertex is contained in d hyperedges, each vertex in V has degree d, and since 
each edge contains e vertices, every vertex in U has degree e. In the general case, vertices 
of V have the prescribed degrees, and the degree of the vertices in U will be the prescribed 
size of the corresponding edge. 

Thus it is enough to generate such a random bipartite graph. For this we used the 
configuration model of Bollobas [3]. For each vertex in the bipartite graph, take the 
prescribed number of half-edges, then randomly combine two such half edges from the 
opposite side. In this way parallel edges may occur. It is shown in [3] that the expected 
number of parallel edges is relatively small, especially if the prescribed degrees are small 
compared to N. So we simply chose to delete any parallel edges. As a result a few edges of 
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the corresponding hypergraph contain less than e vertices, and a few vertices are contained 
in less than d edges. However, it has a negligible effect on the simulation results. The 
situation is similar in the general case. 

4 Simulation results 

4.1 Description of the simulation algorithm 

Let x{t) G {0,1}^ denote the state of the system at time t. Its k-th coordinate, Xk{t), 
is 0, if the fc-th node is susceptible and 1, if it is infected. The hypergraph is given by 
its incidence matrix J', the rows of which correspond to the nodes and the coloumns 
correspond to the hyperedges. That is J'ij = 1 if node i belongs to the j-th hyperedge and 
it is zero otherwise. Then the product x{t)J' gives the number of infected nodes in the 
different hyperedges, that is its j-th coordinate, {x{t)J)j, is the number of infected nodes 
in the j-th hyperedge. Since infection and recovery are governed by Poisson processes, a 
susceptible individual, which has kh infected neighbours in hyperedge h, becomes infected 
with probability 1—exp (—r f(kfi)At) in a small time interval At. Similarly, an infected 
individual recovers with probability 1 — exp(—yAt) in a small time interval At. Thus, 
assuming that node i is susceptible at time t, i.e. Xj(t) = 0 , the rate at which it becomes 
infected is 

M 

We apply the widely used individual-base stochastic simulation as for conventional net¬ 
works. At a given time instant t a vector r G [0,1]'^ is generated with random numbers, 
then the algorithm runs through all the nodes from i = 1 to i = N. If the i-th node is 
susceptible, i.e. Xi{t) = 0, then it becomes infected at time t + At, if 

Jijf{{x{t)J)j)A)j . 

If the i-th node is infected, i.e. Xi{t) = 1, then it becomes susceptible at time t + At, if 

Xi < 1 — exp (— 7 At). 

This process is run with sufficiently small time steps At until the final time tmax is reached. 
We note that running the simulation by using the Gillespie algorithm we obtain completely 
similar results, when At is chosen sufficiently small. Then several simulations, started with 
the same initial condition, are averaged. The simulation results are dealt with in the next 
subsections. 

4.2 The effect of function / 

Here it is studied how the propagation process is affected by the choice of the function /. 
As it was declared in Section 2, in this paper we use the function / given in ([1]). This 
function is parametrised by a single parameter c, which can be interpreted as a threshold 
value in the following sense. If the number of infected nodes in a hyperedge is smaller 
than c, then the infection is proportional to the number of infected neighbours as in the 


( M 

—T 

i=l 
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conventional network case, while above this threshold value the number of infected nodes 
in a hyperedge do not increase the infection pressure. 

Consider first the case of bi-uniform hypergraphs constructed from households and 
workplaces introduced in Subsection 13.11 A random hypergraph was constructed with 
households of size H = 5 and workplaces of size W = 10, where each node belongs 
to exactly two hyperedges, a household and a workplace. Then simulations were run 
with recovery rate 7 = 1 and infection rate r = 0.18 for different values of c. The 
time dependence of the number of infected nodes is shown in Figure [2] for three different 
values of c. One can see that for greater values of c we face stronger epidemic as it is 
expected. In order to compare these simulation results to the usual simulation on graphs 
we created a graph from the hypergraph by exchanging hyperedges to cliques, i.e. to 
complete subgraphs. In other words, each node in a hyperedge is connected to every other 
node of that hyperedge with usual edges. It may happen that two nodes are in the same 
household and in the same workplace, then two edges are created between them in the 
course of the above process. In this case the two edges are weighted with 2, in order to 
make the new network comparable to the original hypergraph. This way a weighted graph 
is created from the hypergraph. Then simulations were run on this weighted graph as well 
and the time dependence of the number of infected nodes is compared to that obtained 
from the simulations on the hypergraph. Figure [2] clearly shows that for large values of c 
the process on the hypergraph is basically the same as the process on the corresponding 
conventional (weighted) network. This is explained by the simple fact that for c = 10 the 
discount effect of the function / cannot come to play. Namely, there are no more than 
10 nodes in the hyperedges, hence f{k) = k for those values of the number of infected 
neighbours k that can occur in this hypergraph. As it was already mentioned in Section 
2, in the course of a quantitative approach this Figure would enable us to fit the value of 
parameter c by comparing real data to the curves obtained from simulations for different 
values of c. 

Consider now the case of hypergraphs created from the cliques of Barabsi-Albert net¬ 
works, see Subsection ESI We constructed a network with N = 500 nodes by using the 
preferential attachment model with m = 4. Then the cliques were determined and sub¬ 
stituted by hyperedges. Simulations were run with recovery rate 7 = 1 and infection rate 
T = 0.02 for different values of c. The time dependence of the number of infected nodes is 
shown in Figure El for three values of c. One can again see that for greater values of c we 
face stronger epidemic as it is expected. The smallest value, c = 3, is below the average 
hyperedge size, hence the effect of the function / can be clearly seen. Namely, the infection 
is smaller than on the weighted graph, which is created from the hypergraph by changing 
hyperedges to cliques. On the other hand, for the largest value, c = 8 , which is greater 
than the average size of the hyperedges, the hypergraph structure has negligible effect 
in the steady state compared to the conventional propagation on the weighted network. 
More importantly, we can see that the hyperedge model has strong effect in the early stage 
of the epidemic, when large cliques are infected probably, for which the hyperedge size is 
larger than these values of c. 

Consider now the case of random hypergraphs created by the configuration model, see 
Subsection E31 We constructed a regular hypergraph, in which all hyperedges contain 
e = 10 nodes and the degree of each node is d = 8 , i.e. each node belongs to 8 hyperedges. 
Then simulations were run with recovery rate 7 = 1 and infection rate r = 0.05 for different 
values of c. The time dependence of the number of infected nodes is shown in Figured for 
two values of c. One can again see that for greater values of c we face stronger epidemic as 
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it is expected. The smaller value, c = 5, is below the hyperedge size, hence the effect of the 
function / can be observed. Namely, the infection is smaller than on the weighted graph, 
which is created from the hypergraph by changing hyperedges to cliques. On the other 
hand, for the greater value c = 10, which is the same as the size of the hyperedges, the 
hypergraph structure has no effect compared to the conventional propagation on network, 
since f{k) = k for those values of the number of infected neighbours k that can occur in 
this hypergraph. 

4.3 The effect of the structure of the hypergraph 

Now we fix the function /, that is fix a value of c and investigate how the parameters of the 
hypergraph affect the spreading process. Consider first the case of bi-uniform hypergraphs 
constructed from households and workplaces in Subsection [3Tj A random hypergraph was 
constructed with households of size H and workplaces of size W, where each node belongs 
to exactly two hyperedges, a household and a workplace. Then simulations were run with 
recovery rate 7 = 1 and infection rate r = 0.18 for different values of H and W. Figured 
shows that the number of infected nodes during the spreading process depends in a non¬ 
trivial way on the values of H and W. One can observe that increasing the size of the 
hyperedges, for example, comparing the case Ff = 5, fF = 10 to the case H = 10, W = 10, 
the infection becomes stronger. On the other hand, comparing the case H = 5, W = 20 
to the case H = 10, W = 10 we can say that on a more heterogeneous (in the sense of 
hyperedge sizes) hypergraph the propagation starts faster, but it can settle at a smaller 
steady state value than in the case of a more homogeneous hypergraph. 

Let us turn to the study of the effect of degree heterogeneity. A hypergraph is con¬ 
structed with M hyperedges, half of them is of size ei and and the other half is of size 62 - 
Half of the N nodes has degree di, i.e. belong to di hyperedges, and the half of them is 
of degree d 2 - We note that the conservation relation ^(ei -|- 62 ) = ^{di + ^ 2 ) must hold. 
In Figure E] the time dependence of the number of infected nodes is shown for different 
degree distributions. We can see that for a more homogeneous degree distribution the 
spreading starts slower but it ends at a higher value. It is also important to note that for 
the most heterogeneous case, when the size ei of the small hyperedges is less than c, the 
discount effect of the function / applies only for the large hyperedges, while in the most 
homogeneous case with ei = 15, 62 = 25 the function / affects all hyperedges. 

The effect of heterogeneity is also shown in Figure [71 where the spread on a regular and 
a bimodal hypergraph is compared to that on a hypergraph with five different hyperedge 
sizes. The hypergraph have N = 500 nodes and consist of M = 400 hyperedges. The 
number of hyperedges is the same in each size category, i.e. for the bimodal hypergraph 
there are 200 hyperedges with both sizes ei and 62 and for the hypergraph with five 
hyperedge sizes there are 80 hyperedges with sizes e* (z = 1,2, ...,5). One can again 
observe that the fastest spread is on the most heterogeneous hypergraph in the initial 
phase of the process, while this network leads to the least prevalence in the final stage of 
the process. 

5 Master equation 

Here the exact master equations of SIS epidemic propagation on an arbitrary hypergraph 
with N nodes are derived. The master equations form a linear system of ordinary dif¬ 
ferential equations, the coefficients of which are the transition rates from one state to 
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another. 

The state space of the process is {S,I}^ containing 2^ elements, since each node can 
be in one of two states S or I. The state space is divided into + 1 classes according to 
the number of infected nodes. Let denote the state where every node is susceptible, i.e. 
5° = {S, S,..., S). Let be the subset with states having k infected nodes, containing 
Cfc = (T) states. Finally, let be the state in which every node is infected, i. e. 
5^ = 

The elements of are denoted by S^, ..., Let Sj{l) be the type of the /-th 

node in the state Sj, so that the value of Sj{l) is either S or I. The state of the system 
can change in two ways: 

1. Infection: a susceptible node becomes infected, this is an Sj —)■ transition, 

where i and j are such that there exists I, for which Sj{l) = S, = I and 

Sj{m) = for every m ^ 1. Furthermore, node I has an infected neighbour, 

which can be formally expressed as follows: there exists an r 7 ^ /, such that S^{r) = I 
and I and r are in the same hyperedge. 

2. Recovery: an infected node becomes susceptible, this is an —>■ transition, 

where i and j are such that there exists I, for which Sj{l) = I, = S and 

Sj{m) = for every m ^ 1. 

Let Xj{t) denote the probability that the system is in state Sj at time t. Let 

denote the probability of states containing k infected nodes, where A: = 0,1,..., A^. The 
above transitions define a system of linear differential equations with constant coefficients 
for known as the Kolmogorov equations or master-equations. The number of the 

infectious nodes changes by one at most in each time step, thus the master-equation can 
be written in the following block tridiagonal form: 

Xf^ = +B’^X^+ C'^X'^+\ k = 0 ,l,---,N, ( 2 ) 

where and are zero matrices. In matrix form: 

X = PX, 

where 

0 0 0 0 

A^ 0 0 0 

0 ^2 ^2 Q Q 

0 0 A^ B^ 0 

0 0 . A^ B^ 

The matrices A^ describe infection and describe recovery. The structure of the network 
is reflected by the matrices A^. 

Let A^ ■ denote the (i, j)-th element of A^, which shows the transition rate from state 
to state Sf. The class has c^-i elements and the class consists of terms. 
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therefore the matrix has rows and c^-i columns. The entry is non-zero if 
and only if Sj~^ and differ only at one position. Let the Lth node be this one, i.e. 
S^-Hl) = S, S^{1) = I and (m) = S^{m) for every m ^ 1. Furthermore, there exists 
a number r ^ I, such that = I and I is in the same hyperedge as r. Then the 

transition rate is given as 

<3 = ^ E / , (3) 

h: l£h 

where Nh{Sj~^) denotes the number of infected nodes in hyperedge h in the state Sj~^ 
and / is the function given in ([T]). Summing these equations for i one obtains for every 
j E {1,2,... ,Cfc_i} that 

(4) 

i=l 

where 

'V|;(-Sf)= E E / ■ P) 

l.S^{l)=S h-lGh 

That is Ngj{Sj) denotes the sum of the values Ylh-i&h for susceptible nodes 

in state Sj. 

Let denote the element in the Lth row and j-th column of the matrix C^. This 
gives the transition rate from state to state S^. The class has c^+i elements 

and the class has terms, therefore has rows and columns. The entry 
C^j is non-zero if and only if 5'^"'"^ and differ only at one node. Let this node be the 
/-th one, that is = I, Sf{l) = S and = S^{m) for every m ^ 1. In this 

case = 7 . The number of infected nodes in state 5'^'*'^ is k + 1, hence there are k + 1 
elements in the j-th column of the matrix C^, which are equal to 7 , all other entries are 
zero. Thus for every j E {1, 2,..., Cfc+i} we have 

J2ctj = -l(k + l)- ( 6 ) 

^=1 

The matrices are diagonal with Ck rows and coloumns. The elements of denote 
the rate of the —)• Sj type transition, which are non-zero if and only if i = j. Since the 
column-wise sum of the elements of P are zero, the matrix B^ is determined as follows: 


Cfc+I 


Cfe -1 


= - E - E cfr'- 

i=i i=i 


(7) 


As an example, the master equations are determined below for the hypergraph with 
= 4 nodes shown in Figured! The incidence matrix of the hypergraph is 


/ 1 0 0 \ 
1 1 0 
oil 

V 1 0 1 / 
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In this case the state space has 2^ elements and we have four classes according to the 
number of infected nodes: 


= Xssss, 

= {^sssi,Xssis,Xsiss,^isss), 

= {Xssil, XsiSI.XsilS, Xissi, Xjsis, Xiiss) , 

= {Xsiii,Xisn,Xiisi,Xiiis ), 

= Xiiii. 

The vector of probabilities is X = (X^,X^,X^,X^,X^) and the matrix P of the linear 
system of ODEs yielding the master equations is 

/ 50 0 0 0 \ 

i) 0 

P = 0 52 (^2 0 

0 0 

\ 0 0 0 / 

The submatrices are be given as follows 

/(I) /(I) 0 0 

/(I) 0 /(I) 0 

0 /(I) /(I) 0 

/(I) 0 0 /(I) 

0 0 0 0 
0 0 /(I) /(I) 




/ 2/(1) 2/(1) 2/(1) 0 0 0 

/(I) 0 0 /(I) 2/(1) 0 

0 /( 2 ) 0 /( 2 ) 0 /( 2 ) 

V 0 0 /(I) 0 2/(1) /(I) 

X" = r (/( 2 ),/(l) + /( 2 ), 2 /( 1 ),/(I) + /( 2 )), 
B° = (0), (7° = ( 7 , 7 , 7 , 7 ), 

/77070 0 \ 

7 O 7 O 7 O 
0 7 7 0 0 7 ’ 

\0 0 0 777/ 


/ 7 7 0 0 \ 

7 0 7 0 

^2 ^ 7 0 0 7 

0 7 7 0 ’ 

0 7 0 7 

\ 0 0 7 7 / 



For example, the elements of the third row of the matrix are 



( 0 ,r/( 2 ), 0 ,r/( 2 ), 0 ,r/( 2 )), 
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which describe the following rates of transitions: SSII IISI, SISI — IISI, SIIS 
IISI, ISSI — IISI, ISIS IISI, IISS IISI. The first, third and fifth transitions 
can not be realized, as the states differ in more than one node. During the second tran¬ 
sition, the first node becomes infected, hence we need to compute the term N^j{SISI). 
This is equal to /(2), because the first node is in a hyperedge, where it has two infected 
neighbours. The other transitions can be determined in a similar way. 

Using this method one can determine the master equations for an arbitrary hypergraph 
theoretically. However, we have to note that the master equations are useful mainly from 
the theoretical point of view, as the above construction can only be carried out practically 
for hypergraphs of moderate size, because of the huge number of equations. This motivates 
the derivation of approximating systems, called mean-field equations that will be dealt with 
in the next section. 

6 Mean-field theory 

6.1 Exact differential equations for the expected number of infected and 
susceptible nodes 

The main idea of mean-field theory is to consider some expected quantities instead of the 
probabilities of each individual state. The most important quantities are the expected 
number of infected and susceptible nodes that can be given as 

AT Cfc N Ck 

= E(^ - fc) EEw- (8) 

k=0 j=l k=0 j=l 

During the derivation of differential equations for these expected values below, we will 
need the quantity 

N Ck 

[SI] = ^^Nlj{S^)X^{t), (9) 

fc=0 j=l 

that can be considered as the generalization of the average number of SI edges defined in 
conventional networks. We remind that N^j is defined in ([2]). Now we are in the position 
to derive the exact differential equations for the expected values [I]{t) and [S]{t) starting 
from the master equations ([5]). 

Theorem 1 The expected values [I]{t) and [5](t) satisfy the following differential equa¬ 
tions for an arbitrary hypergraph. 

[5]=7[/]-r[5I], (10) 

[I]=t[SI]-^[I]. (11) 

Proof 

Introducing the notation S^ = (1, 1, ..., 1) we have 

Y,X'f = SkX^ 
i=i 
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hence ([5]) takes the form 


N N 

[/](t) = Y,kSkX\ [5](t) = Y,ik^-k)SkX\ (12) 

/c=0 k=0 

Equation ([7]) can be written as 

Bl, = - - [Sk-iC^-^),, 

and using that is a diagonal matrix we get 

Thus for every i = 1,..., Cfc 

holds that can be written as 

+ SkB>^ + Sk-iC’^-^ = 0, (13) 

holding for every A: = 0,1,..., A^, where A^^^ and C~^ are zero matrices. 

Differentiating the function [I]{t) and using the derivatives of X^ given by ([2]) leads to 

N N 

[j] = kSkX^ = Y kSk (a^X^-^ + B^X^ + = 

k={) k=0 

N N N-1 

= Y kSkA’^X’^-^ + Y kSkB^X^ + Y kSkC^X^+^ = 

fc=l fc=0 fc=0 

N-l N N 

= Yik + l)Sk+iA^^^X’^ + Y kSkB^X^ + Y^k - l)Sk-iC^~^X^ = 

k=0 k=0 k=l 

N 

= ^ ((fc + l)Sk+iA^^^ + kSkB^ + {k- l)Sk-iC^-^) X\ 
k=0 

Thus from equation (1131) we obtain the following differential equation: 

N 

[I] = Y {Sk+iA’^^^ - Sk-iC^-^) 

k=0 

Now, the desired equation, m can be derived by using the proposition below. The proof 
for [S]{t) is similar. 

Proposition 1 The matrices A^ and satisfy the following identities. 

1 . Sk-iC^-^ = 7kSk, 

N 

2 . Z Sk-iC>^-^X>^ = ^[I], 
k=0 
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3. 


N 

E Sk+iA^+^X^ 


k=0 




Proof 

From equation ([6|) one obtains 


- iC ‘ 


k—1 


Cfe-l 


i=l 


k-1 

d 


jk, 


for every j G {1, 2,... , Cfc}, therefore Sk-iC^~^ = ^kSk, proving the first part of the 
statement. 

The second part follows from the first one by using equation (d- 
To prove the last statement write equation (jH) as 


Cfc+I 


Sk+iA’^+^)=^A^+^ = TNfjiS^), 


2=1 


for an arbitrary j € {1, 2,... , c^}. Therefore 


N N Ck N Ck 

Y, Sk+iA^+^X^ = E E (*5^+1^"+') . ^ E E Nij{S^)X^{t) = r[5/] 

fc=0 fc=0 j=l ^ k=0j=l 

that we wanted to prove. 

Theorem [1] provides exact differential equations for the expected number of suscepti¬ 
ble and infected nodes, however, these differential equations are not self-contained, since 
the function [SI] is not known. The next step of the mean-field theory is to derive an 
approximation of [SI] in terms of [S'] and [/] in order to make the system closed. This is 
called a moment closure approximation, which is the subject of the next subsection. 


6.2 Closed mean-field equations and their comparison to simulation 


Our aim now is to derive an approximation to [SI] given in ([9]), which is based on the 
definition in ([5]). 

Consider first hypergraphs describing the network structure with households of size H 
and workplaces of size W. In these hypergraphs, a node belongs to exactly two hyperedges, 
a household and a workplace. Denoting the total number of infected nodes by I, the 
average number of infected neighbours of a node can be approximated by in a 

household and by in a workplace. Thus the second summation in ([2]) consists of 

two terms and is approximated as 


/ 


H -I 

N ' 


+ / 


fW -I 

\Tr 



Since this is independent of I, the double sum in ([5]) reduces to 



(N-k) 


f 


H - 1 

N ^ 


+ f 


/IT- 1 



because the number of susceptible nodes in state 5^ is Ai — k. Now ([U]) leads to the 
approximation 


[SI] 


f 


H -I 

N ' 


+ f 


fW-I 



N Cfc 


k=0j=l 
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(TV-[/]). 


/ 


H -I 
N ^ 


+ f 


/W-1 


Thus equation (fTT]l can be approximated as 


i = t{n -1) 


H -1 

N ■ 




-ih 


(14) 


The solution of this equation is compared to simulation in Figure [8] for two different 
hypergraphs. We can observe that the mean-field approximation gives better agreement 
when the hypergraph is homogeneous, i.e. for the case H = IQ, W = 10. 

Consider now regular random hypergraphs, in which each node belongs to d hyperedges 
and each hyperedge is of size e. Denoting the number of infected nodes by I, the average 
number of infected neighbours of a node in a hyperedge can be approximated by (e — 1)^. 
Thus the second summation in Q consists of a single term and can be approximated as 
df ((e — l);^)- Since this is independent of I, the double sum in ([5]) reduces to 

Nlj{S^)^{N-k)df(^^iy 

because the number of susceptible nodes in state Sj is N — k and each node belongs to d 
hyperedges. Now Q leads to the approximation 

/ ^ \ ^ /l\ 

\si ]» df = df ‘-j-i) (iv-[/]), 

^ ^ A:=0i=l ^ ^ 

Thus, for regular random hypergraphs, equation (fTT|) can be approximated as 


i = T{N-i)df 



-fl. 


(15) 


The solution of this equation is compared to simulation in Figure [9] for two different 
values of c. We can see that for regular random hypergraphs the mean-field approximation 
performs well and for the steady state it gives excellent agreement. Similarly to the case 
of networks, the solution of the mean-field equation increases faster than the simulated 
curve, but their steady states are close to each other. 


7 Discussion 

In this paper the methods of mathematical modeling of epidemic propagation on networks 
is extended to hypergraphs. The aim of this extension is to account for both the community 
structure and the nonlinear dependence of the infection pressure on the number of infected 
neighbours. The novelty of the model is that a susceptible individual is assumed to 
become infected with probability 1 — exp(—rAt) in a small time interval At with rate 
r = T'^^f{kh), where the summation is for those hyperedges h G £ that contain the 
susceptible node, kh denotes the number of infected nodes in the hyperedge h and / is 
a function given in ([T|). The simulation algorithm, developed for networks, is extended 
to hypergraphs to account for this new transition probability function. Individual-based 
stochastic simulations were run on three different types of hypergraphs, configuration 
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random hypergraphs, hypergraphs with hyperedges created from the cliques of a power- 
law random graph and bi-uniform hypergraphs the vertices of which are divided into 
households and workplaces randomly. The effects of hypergraph structure and the model 
parameters are investigated via individual-based simulation results. The exact master 
equations of the epidemic spreading are derived for an arbitrary hypergraph given by its 
incidence matrix. Based on these, moment closure approximation and mean-held models 
are introduced and compared to individual-based stochastic simulations. 

The paper is the hrst step in extending the mathematical modeling of epidemic spread¬ 
ing from networks to hypergraphs. There are several directions where this extension can 
be continued. One of these is to develop and investigate pairwise models, in which not 
only the expected value of susceptible and infected nodes, but also the extension of pairs, 
as introduced in ([91), are determined. The exact master equation enables us to use lumping 
for reducing the size of the system when the network has a special symmetry. The idea of 
lumping is also extendable to the case of hypergraphs. Our results show that the simple 
mean-field models developed in Section 16.21 performs well for homogeneous hypergraphs. 
The investigation of heterogeneous hypergraphs and the derivation and investigation of 
the corresponding heterogeneous mean-field models is also a challenging subject. The 
functional form and parameters of / in ([1]) could be determined based on real epidemic 
propagation data by using some htting procedure. This approach is beyond the scope of 
this paper and may be the subject of future work. 
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Figure 1: A simple hypergraph with 4 nodes and 3 hyperedges. 





Figure 2: Time dependence of the number of infected nodes for hypergraphs with house¬ 
holds of size H = 5 and workplaces of size W = 10. The curves belonging to different 
values of the parameter c in ([ 1 ]) are shown together with the prevalence corresponding to 
the propagation on a network with hyperedges substituted by complete subgraphs. The 
parameter values are N = 500, 7 = 1 and r = 0.18. 
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Figure 3: Time dependence of the number of infected nodes for hypergraphs created from 
the cliques of a Barabasi-Albert graph. The curves belonging to different values of the pa¬ 
rameter c in © are shown together with the prevalence corresponding to the propagation 
on a network with hyperedges substituted by complete subgraphs. The parameter values 
are N = 500, 7 = 1 and r = 0.02 . 



Figure 4: Time dependence of the number of infected nodes for a regular random hyper¬ 
graph, in which every node belongs to d = 8 hyperedges, each of which is of size e = 10 . 
The curves belonging to different values of the parameter c in ([T]) are shown together with 
the prevalence corresponding to the propagation on a network with hyperedges substi¬ 
tuted by complete subgraphs. The parameter values are N = 500, M = 400, 7=1 and 
r = 0.05. 


19 















350 



Figure 5: Time dependence of the number of infected nodes for hypergraphs with house¬ 
holds of size H and workplaces of size W. The parameter values are c = 5 in ([1]), = 500, 

7 = 1 and r = 0.18. 



Figure 6 : Time dependence of the number of infected nodes for bimodal random hyper¬ 
graphs, in which half of the nodes belong to di hyperedges and half of them belong to d 2 
hyperedges. The total number of hyperedges is M = 400, half of them are of size ei, the 
other half of them are of size 62 - The parameter values are c = 10 in ([1]), = 500, 7 = 1 

and r = 0.05. 
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Figure 7; Time dependence of the number of infected nodes for a regular hypergraph 
(continuous line), for a bimodal hypergraph (dashed line) and for a hypergraph with 5 
different hyperedge sizes (dashed-dotted line), with the same number of hyperedges in 
each category. The parameter values are c = 10 in ([T]), = 500, M = 400, 7 = 1 and 

r = 0.05. 



Figure 8 : Time dependence of the number of infected nodes for hypergraphs with house¬ 
holds of size H and workplaces of size W. The solution of the mean-field equation (|14l) is 
also shown. The parameter values are c = 7 in ([1]), = 500, 7 = 1 and r = 0.18. 
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Figure 9: Time dependence of the number of infected nodes for a regular random hyper¬ 
graph, in which every node belongs to d = 16 hyperedges, each of which is of size e = 20. 
The simulation curves belonging to c = 10 and c = 15 in ([1]) are shown together with 
the prevalence given by the mean-field equation (|15|) . The parameter values are N = 500, 
M = 400, 7 = 1 and r = 0.03. 
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